4 research outputs found
Building MDE cloud services with DISTIL
Also published online by CEUR Workshop Proceedings (CEUR-WS.org, ISSN 1613-0073)Â Model-Driven Engineering (MDE) techniques, like
transformations, queries, and code generators, were devised for
local, single-CPU architectures. However, the increasing complexity
of the systems to be built and their high demands in terms of
computation, memory and storage, requires more scalable and
flexible MDE techniques, likely using services and the cloud.
Nonetheless, the cost of developing MDE solutions on the cloud
is high without proper automation mechanisms.
In order to alleviate this situation, we present DISTIL, a
domain-specific language to describe MDE services, which is
able to generate (NoSQL-based) respositories for the artefacts of
interest, and skeletons for (single or composite) services, ready
to be deployed in Heroku. We illustrate the approach through
the construction of a repository and a set of cloud-based services
for bent ÂŻo reusable transformation components.Work supported by the Spanish Ministry of Economy and Competitivity (TIN2011-24139, TIN2014-52129-R), the EU commission (FP7-ICT-2013-10, #611125) and the Community of Madrid (S2013/ICE-3006
ParallĂ©lisation dâun code Ă©lĂ©ments finis spectraux. Application au contrĂŽle non destructif par ultrasons
The subject of this thesis is to study numerous ways to optimize the high order spectral finite element method (SFEM) computation time. The goal is to improve performance based on easily accessible architectures, namely SIMD multicore processors and graphics processors. As the computational kernels are limited by memory accesses (indicating a low arithmetic intensity), most of the optimizations presented are aimed at reducing and accelerating memory accesses. Improved matrix and vectors indexing, a combination of loop transformations, task parallelism (multithreading) and data parallelism (SIMD instructions) are transformations aimed at cache memory optimal use, registers intensive use and multicore SIMD parallelization. The results are convincing: the proposed optimizations increase the performance (between x6 and x11) and speed up the computation (between x9 and x16). The SIMDized implementation is up to x4 better than the vectorized implementation. The GPU implementation is between two and three times faster than the CPU one, knowing that a NVLink high-speed connection will allow a correct masking of memory transfers. The proposed transformations form a methodology to optimize intensive computation codes on common architectures and to make the most of the possibilities offered by multithreading and SIMD instructions.Le sujet de cette thĂšse consiste Ă Ă©tudier diverses pistes pour optimiser le temps de calcul de la mĂ©thode des Ă©lĂ©ments finis spectraux d'ordre Ă©levĂ© (SFEM). Lâobjectif est dâamĂ©liorer la performance en se basant sur des architectures facilement accessibles, Ă savoir des processeurs multicĆurs SIMD et des processeurs graphiques. Les noyaux de calcul Ă©tant limitĂ©s par les accĂšs mĂ©moire (signe dâune faible intensitĂ© arithmĂ©tique), la plupart des optimisations prĂ©sentĂ©es visent la rĂ©duction et lâaccĂ©lĂ©ration des accĂšs mĂ©moire. Une indexation amĂ©liorĂ©e des matrices et vecteurs, une combinaison des transformations de boucles, un parallĂ©lisme de tĂąches (multithreading) et un parallĂ©lisme de donnĂ©es (instructions SIMD) sont les transformations visant lâutilisation optimale de la mĂ©moire cache, lâutilisation intensive des registres et la parallĂ©lisation multicĆur SIMD. Les rĂ©sultats sont concluants : les optimisations proposĂ©es augmentent la performance (entre x6 et x11) et accĂ©lĂšrent le calcul (entre x9 et x16). LâimplĂ©mentation codĂ©e explicitement avec des instructions SIMD est jusquâĂ x4 plus performant que lâimplĂ©mentation vectorisĂ©e. LâimplĂ©mentation GPU est entre 2 et 3 fois plus rapide quâen CPU, sachant quâune connexion haut dĂ©bit NVLink permettrait un meilleur masquage des transferts de mĂ©moire. Les transformations proposĂ©es par cette thĂšse composent une mĂ©thodologie pour optimiser des codes de calcul intensif sur des architectures courantes et pour tirer parti au maximum des possibilitĂ©s offertes par le multithreading et les instructions SIMD
Spectral finite element code parallelization. Application to non-destructive ultrasonic testing.
Le sujet de cette thĂšse consiste Ă Ă©tudier diverses pistes pour optimiser le temps de calcul de la mĂ©thode des Ă©lĂ©ments finis spectraux d'ordre Ă©levĂ© (SFEM). Lâobjectif est dâamĂ©liorer la performance en se basant sur des architectures facilement accessibles, Ă savoir des processeurs multicĆurs SIMD et des processeurs graphiques. Les noyaux de calcul Ă©tant limitĂ©s par les accĂšs mĂ©moire (signe dâune faible intensitĂ© arithmĂ©tique), la plupart des optimisations prĂ©sentĂ©es visent la rĂ©duction et lâaccĂ©lĂ©ration des accĂšs mĂ©moire. Une indexation amĂ©liorĂ©e des matrices et vecteurs, une combinaison des transformations de boucles, un parallĂ©lisme de tĂąches (multithreading) et un parallĂ©lisme de donnĂ©es (instructions SIMD) sont les transformations visant lâutilisation optimale de la mĂ©moire cache, lâutilisation intensive des registres et la parallĂ©lisation multicĆur SIMD. Les rĂ©sultats sont concluants : les optimisations proposĂ©es augmentent la performance (entre x6 et x11) et accĂ©lĂšrent le calcul (entre x9 et x16). LâimplĂ©mentation codĂ©e explicitement avec des instructions SIMD est jusquâĂ x4 plus performant que lâimplĂ©mentation vectorisĂ©e. LâimplĂ©mentation GPU est entre 2 et 3 fois plus rapide quâen CPU, sachant quâune connexion haut dĂ©bit NVLink permettrait un meilleur masquage des transferts de mĂ©moire. Les transformations proposĂ©es par cette thĂšse composent une mĂ©thodologie pour optimiser des codes de calcul intensif sur des architectures courantes et pour tirer parti au maximum des possibilitĂ©s offertes par le multithreading et les instructions SIMD.The subject of this thesis is to study numerous ways to optimize the high order spectral finite element method (SFEM) computation time. The goal is to improve performance based on easily accessible architectures, namely SIMD multicore processors and graphics processors. As the computational kernels are limited by memory accesses (indicating a low arithmetic intensity), most of the optimizations presented are aimed at reducing and accelerating memory accesses. Improved matrix and vectors indexing, a combination of loop transformations, task parallelism (multithreading) and data parallelism (SIMD instructions) are transformations aimed at cache memory optimal use, registers intensive use and multicore SIMD parallelization. The results are convincing: the proposed optimizations increase the performance (between x6 and x11) and speed up the computation (between x9 and x16). The SIMDized implementation is up to x4 better than the vectorized implementation. The GPU implementation is between two and three times faster than the CPU one, knowing that a NVLink high-speed connection will allow a correct masking of memory transfers. The proposed transformations form a methodology to optimize intensive computation codes on common architectures and to make the most of the possibilities offered by multithreading and SIMD instructions
A fast implementation of a spectral finite elements method on CPU and GPU applied to ultrasound propagation
International audienceIn this paper we present an optimization of a spectral finite element method implementation. The improvements consisted in the modification of thememory layout of the main algorithmic kernels and in the augmentation of the arithmetic intensity via loop transformations. The code has been deployed on multi-core SIMD machines and GPU. Compared to our starting point, i.e. the original scalar sequential code, we achieved a speed up of Ă228 on CPU.We present comparisons with the SPECFEM2D code that prove the good performances of our implementation on similar cases. On GPU, a hybrid solution is investigated